Gene selection via the BAHSIC family of algorithms
نویسندگان
چکیده
MOTIVATION Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert-Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems. RESULTS In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable. AVAILABILITY Accompanying homepage is http://www.dbs.ifi.lmu.de/~borgward/BAHSIC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
The BAHSIC family of gene selection algorithms
Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical...
متن کاملA New Hybrid Method for Improving the Performance of Myocardial Infarction Prediction
Abstract Introduction: Myocardial Infarction, also known as heart attack, normally occurs due to such causes as smoking, family history, diabetes, and so on. It is recognized as one of the leading causes of death in the world. Therefore, the present study aimed to evaluate the performance of classification models in order to predict Myocardial Infarction, using a feature selection method tha...
متن کاملThe prediction of lymphedema via the combination of the selected data mining algorithms
Background: Breast cancer is the second leading cause of cancer death in women, after lung cancer. Due to the importance of predicting this disease, the use of data mining methods in medical research is more significant than before. Data mining algorithms can be a great help in preventing the development of lymphedema in patients. The aim Of this study was to create a diagnosis system that can ...
متن کاملA stochastic model for project selection and scheduling problem
Resource limitation in zero time may cause to some profitable projects not to be selected in project selection problem, thus simultaneous project portfolio selection and scheduling problem has received significant attention. In this study, budget, investment costs and earnings are considered to be stochastic. The objectives are maximizing net present values of selected projects and minimizing v...
متن کاملAn Expert System for Intelligent Selection of Proper Particle Swarm Optimization Variants
Regarding the large number of developed Particle Swarm Optimization (PSO) algorithms and the various applications for which PSO has been used, selecting the most suitable variant of PSO for solving a particular optimization problem is a challenge for most researchers. In this paper, using a comprehensive survey and taxonomy on different types of PSO, an Expert System (ES) is designed to identif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 23 13 شماره
صفحات -
تاریخ انتشار 2007